Data for extreme weather events and casualties from extreme weather was obtained from the National Climatic Data Centre archive (each one set). A total set comprises of data for each year for the period from 1950-2017 for both, extreme weather events and casualties. A Python script was used to scrape all data sets. First of all, all data sets (extreme weather events and casualties from extreme weather) will be loaded and combined into a master data frame (df_events_victims), which will be used for all downstream analyses. For more information, please refer to the readme file (readme.md).
# Load all csv files and combine victims_data with storm_events via EVENT_ID
combine_data = function(range, path_events = "storm_data/events_data/", path_victims = "storm_data/victims_data/"){
combined_events = data.frame()
combined_victims = data.frame()
for(year in range){
#print(year)
for(csv_file in list.files(path_events)) {
regexpr = regexec(as.character(year), csv_file) # regex expression, if you want to load data only for a particular period (start year-end year). requires zip files and csv files to be in seperated folders. Otherwise regex will match both and the final dataframe will contain duplicated data.
match = length(regmatches(csv_file,regexpr)[[1]])
if(match){
file_in_events = read.csv(paste(path_events,csv_file,sep=""), colClasses = c(rep(NA,48),rep("NULL",3))) # exclude the last 3 columns (episode_narrative, event_narrative,data_source)
colnames(file_in_events)[12]="MONTH" # change column name MONTH_NAME to MONTH
file_in_events$MONTH = month(dmy_hms(file_in_events$BEGIN_DATE_TIME, locale = "en_US.UTF-8")) # change month string representation to integer in MONTH
file_in_victims = read.csv(paste(path_victims,gsub("Storm_Event","Storm_Fatalities",csv_file),sep=""), colClasses = c(rep("NULL",4),NA,rep("NULL",2), rep(NA,3),"NULL")) # include only fatality_age and fatality_gender columns
if(is.integer(file_in_events$DAMAGE_CROPS)){
file_in_events$DAMAGE_CROPS = as.character(file_in_events$DAMAGE_CROPS) # bind_rows tries to convert to factor if coltype is integer, which throws an error!! convert to chr
}
combined_events = bind_rows(combined_events, file_in_events)
combined_victims = bind_rows(combined_victims, file_in_victims)
}
}
}
return(left_join(combined_events, combined_victims, by = "EVENT_ID"))
}
# providing range in years in the combine_data() function defines which years to include in the master data frame
df_events_victims = combine_data(1950:2017)
# optional, write new csv file
# write.csv(df_events_victims, file = "storm_data/all_storm_events_victims.csv", row.names=FALSE)
There is compelling evidence that supports the existence of climate change (e.g. https://climate.nasa.gov/evidence/). However, it is very challenging to determine the long-term social and economic consequences for the global community. Especially, because the transformation of climate change into socioeconomic effects takes place gradually over the period of years, decades and centuries, it becomes quite abstract and difficult to grasp for the society. Especially, because leading industrial countries can yet cope with these consequences by means of technological advancement and economic power. But mostly, this is due to the earth’s capability of resource regeneration in combination with existing resource capacities. However, the earth’s buffering effect is only limiting if we don’t change the lifestyle of western industrial nations (Moreover, be aware that less developed countries, which are prevalent in terms of quantity, seek to reach our economic status). As such, I strongly believe that the society (including myself!!) is given a false sense of security and tends to be ignorant about the true consequences for the all the generations to come after us!! Nevertheless, I also strongly believe that we can make the global transition from non-sustainable to sustainable society.
So I am very motivated to explore the data provided and translate relevant information in an insightful way in order to make climate change and cognate consequences less abstract and, thus, contributing to raise the awareness for this very important topic that affects each one of us!! More precisely, I am using time series data on global temperatures, carbon dioxide concentration and extreme weather events aiming at analysing whether such extreme conditions became more frequent over the years and if there is a correlation with global warming. The analysis will be focusing on the United States.
Let’s visualize the unique categories of storm events present in the dataset, and plot the total number of occurrences from 1950-2017 for each category.
options(scipen=999) # scipen is used to suppress scientific notation in R (e+ notation for numbers); for more information check R help
ggplot(aes(x=EVENT_TYPE, fill = "red"), data = df_events_victims) + geom_bar() +
guides(fill=guide_legend(ncol=8)) +
scale_y_sqrt(breaks=c(0,50000,100000,200000,300000,400000)) +
xlab("Type of Extreme Weather")+
ylab("Number of Events") +
geom_hline(yintercept = 50000, linetype = 2, alpha = 0.3) +
ggtitle("Total Number of Extreme Weather Events by default Categories")+
guides(fill=F) +
theme_minimal() +
theme(plot.title= element_text(hjust=0.5, size = 14, face = "bold"), axis.text.x = element_text(angle = 45, hjust = 1, size = 10),
axis.text.y = element_text(size = 10), axis.title =element_text(size=12))
Hail and Thunderstorm Wind are the most frequent categories over the time period 1950-2017. In general, “less dramatic” manifestations of extreme weather that belong to categories related to “Storm”, “Flood,”Winter Weather" are overall the most frequent. This is obviously to be expected, since some events, like Tsunamis or Volcanic Eruptions, are simply less likely to be triggered as opposed to the aforementioned events. The very high total number of Tornado occurrences (hline in the above plot depicts count = 50000) really surprised me (probably, because Tornados are extremely rare in Europe). In contrast, I am surprised that no earthquakes were documented. It is evident that there is some redundancy at the level of event type. E.g., there are multiple subcategories for Thunderstorms, Floods, Hail etc.. For further analysis with respect to storm event types I will try to reduce redundancy by grouping events. This will ensure a better overview in all subsequent analysis that include weather events.
Before deciding on the grouping pattern, I have to decide whether some events should be excluded from the data set. In particular, I am only interested to analyse the dynamics of extreme weather events that, according to present knowledge, can be correlated with negative effects on society, nature and economy. However, there is even a documented correlation between high Northern Lights activity and the collapse of the Quebec power grid (http://www.bbc.com/future/story/20130410-the-greatest-show-on-earth). This is also true for Tides that are described to have far-reaching consequences for ecosystems and society (https://oceanservice.noaa.gov/education/pd/tidescurrents/tidescurrents_effects_influences.html). As such, I won’t exclude any event type.
Weather categories proposed by the NSSL (http://www.nssl.noaa.gov/education/svrwx101/) were extended by the groups “water column” and “other” to define following major groups:
Semantics of recorded event types were considered to assign each event to the appropriate group. E.g., all names containing the word “Thunderstorm” were combined into the group “storm”, or any event having characteristics of a storm, such as “Hurricane” and “Lightning”. Of course, the underlying classification is to some extent arbitrary and other classifications are also possible. Single events that have no obvious association to the first 7 categories were grouped into the “other” group.
category_storm = c("Thunderstorm Wind", "THUNDERSTORM WINDS/FLOODING", "THUNDERSTORM WINDS/FLASH FLOOD", "THUNDERSTORM WINDS LIGHTNING", "THUNDERSTORM WIND/ TREES", "THUNDERSTORM WIND/ TREE", "THUNDERSTORM WINDS FUNNEL CLOU", "THUNDERSTORM WINDS/HEAVY RAIN", "THUNDERSTORM WINDS HEAVY RAIN", "THUNDERSTORM WINDS/ FLOOD", "Lightning", "Tropical Storm", "Hurricane (Typhoon)", "Heavy Rain", "Hurricane (Typhoon)","Marine Thunderstorm Wind", "Tropical Depression", "Hurricane", "Marine Tropical Storm", "Marine Hurricane/Typhoon", "Marine Lightning", "Marine Tropical Depression")
category_flood = c("Flash Flood", "Flood", "Coastal Flood", "Storm Surge/Tide","Tsunami", "Lakeshore Flood")
category_hail = c("Hail", "HAIL/ICY ROADS","HAIL FLOODING", "Marine Hail")
category_tornado =c("Tornado", "TORNADOES, TSTM WIND, HAIL", "TORNADO/WATERSPOUT", "Funnel Cloud", "Waterspout", "Dust Devil")
category_wind = c("High Wind", "Strong Wind", "Marine High Wind", "Heavy Wind", "Marine Strong Wind")
category_winter = c("Winter Storm", "Blizzard", "Cold/Wind Chill", "Heavy Snow", "Ice Storm", "Winter Weather", "Avalanche", "Frost/Freeze", "Freezing Fog", "Sleet", "Extreme Cold/Wind Chill", "High Snow")
category_drought = c("Heat", "Wildfire", "Dust Storm", "Drought", "Excessive Heat")
category_tide = c("High Surf", "Rip Current", "Astronomical Low Tide", "Seiche", "Sneakerwave")
category_other = c("Debris Flow", "Lake-Effect Snow", "Volcanic Ash", "OTHER", "Northern Lights", "Dense Smoke", "Landslide", "Volcanic Ashfall", "Marine Dense Fog", "Dense Fog")
# hash the categories (similar to dictionary object in python); Allows easy and fast categorization of sub-categories (e.g., "Heat") into corresponding main category (e.g., "Drought")
major_categories = hash(Storm=category_storm, Flood=category_flood, Hail=category_hail, Tornado=category_tornado, Wind=category_wind, Winter=category_winter ,Drought=category_drought, Tide=category_tide, Other=category_other)
# retrieve the main category for each extreme weather sub-category
classification = function(event){
output = NULL
for(key in keys(major_categories)){
if (event %in% major_categories[[key]]){
output = key
}
else if(is.null(output)){
output = NA
}
}
output
}
# apply new categorization of extreme weather events to the main data set
df_events_victims["EVENT_CATEGORY"] = apply(subset(df_events_victims,select = c("EVENT_TYPE")), 1, function(x) classification(x))
p1 = ggplot(aes(x=EVENT_CATEGORY, fill = "red"), data = df_events_victims) + geom_bar() +
theme(legend.position="bottom", axis.text.x = element_text(angle = 45, hjust = 1)) +
guides(fill=guide_legend(ncol=8)) +
scale_y_sqrt() +
xlab("Category") +
ylab("Number of Events") +
ggtitle("Total Number of Extreme Weather Events\nby new Categorization") +
guides(fill=F) +
theme_minimal() +
theme(plot.title= element_text(hjust=0.5, size = 14, face = "bold"), axis.text.x = element_text(angle = 45, hjust = 1,size = 10),
axis.text.y = element_text(size = 10), axis.title =element_text(size=12))
# load "RdYlBu" color palette with 9 distinct colors (for each category)
getPalette = colorRampPalette(brewer.pal(9,"RdYlBu"))
p2 = ggplot(aes(x=MONTH), data = df_events_victims) + geom_bar(aes(fill = EVENT_CATEGORY)) +
xlab("Month") +
ylab("Number of Events") +
ggtitle("Seasonal Distribution for Extreme Weather Categories")+
theme_minimal() +
scale_x_continuous(limits = c(0,13), breaks = seq(1,12,1), labels = c("Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec")) +
scale_fill_manual(values=getPalette(9), guide = guide_legend(title = "Category")) +
theme(plot.title= element_text(hjust=0.5, size = 14, face = "bold"), axis.text.x = element_text(angle = 45, hjust = 1, size = 10),
axis.text.y = element_text(size = 10), axis.title =element_text(size=12), legend.text=element_text(size=10), legend.title=element_text(size=12))
grid.arrange(p1,p2,ncol=2, widths=c(5,6))
After merging similar events into major categories, we can see that events related to Hail, Storm and Winter Weather are by far the most frequent categories over the period 1950-2017, followed by events related to Flood, Drought, Tornado and Wind. The least frequent events belong to major categories Tide and Other. Let’s analyse the seasonal distribution for each event category. As we can see, different types of extreme weather have a specific seasonal distribution. E.g., the major Storm season is between May and August, whereas Hail is predominantly occurring between April and July and as expected, Winter category events mostly occur in winter months December, January and February.
The data set provides information about injuries and death toll caused by extreme weather events. I want to analyse the distribution of such incidents (average casualties etc.) and how many such incidents have been recorded in total.
# combines direct and indirect casualties for each type (injuries vs. fatalities)
casualties_reshape = subset(df_events_victims, select = c("STATE", "STATE_FIPS","INJURIES_DIRECT", "INJURIES_INDIRECT", "DEATHS_DIRECT", "DEATHS_INDIRECT", "FATALITY_AGE", "FATALITY_SEX")) %>%
# state fips have two-digit format; formatC() preserves 2-digit format and adds a 0 if entry is of 1-digit type
transform(FIPS = formatC(STATE_FIPS, digits=2,width=2,format = "d", flag=0) ) %>%
# converts NA to 0 for one casualty category (here, indirect vs. direct) if other category is not NA; This avoids NA result when summing number and NA
mutate_at(vars(DEATHS_DIRECT), funs(ifelse(!is.na(DEATHS_INDIRECT) & (is.na(DEATHS_DIRECT) | DEATHS_DIRECT==0),0, DEATHS_DIRECT ))) %>%
mutate_at(vars(DEATHS_INDIRECT), funs(ifelse(!is.na(DEATHS_DIRECT) & (is.na(DEATHS_INDIRECT) | DEATHS_INDIRECT==0),0, DEATHS_INDIRECT ))) %>%
mutate_at(vars(INJURIES_DIRECT), funs(ifelse(!is.na(INJURIES_INDIRECT) & (is.na(INJURIES_DIRECT) | INJURIES_DIRECT==0),0, INJURIES_DIRECT ))) %>%
mutate_at(vars(INJURIES_INDIRECT), funs(ifelse(!is.na(INJURIES_DIRECT) & (is.na(INJURIES_INDIRECT) | INJURIES_INDIRECT==0),0, INJURIES_INDIRECT ))) %>%
transform(Fatalities = DEATHS_DIRECT + DEATHS_INDIRECT, Injuries = INJURIES_DIRECT + INJURIES_INDIRECT)
# total counts for each category (injuries vs. fatalities)
casualties_total = casualties_reshape %>%
select(c("Fatalities", "Injuries")) %>%
gather(key = "casualty", value = "count",1:2) %>%
group_by(casualty) %>%
summarise(total_counts = sum(count))
# number of casualties for each extreme weather event (used to plot the distribution of casualties)
casualties_by_category = casualties_reshape %>%
select(c("Fatalities", "Injuries")) %>%
gather(key = "casualty", value = "count",1:2)
# number of casualties by age and gender
casualties_by_age_gender = subset(casualties_reshape, !is.na(casualties_reshape$FATALITY_AGE)) %>%
select(c("FATALITY_SEX","FATALITY_AGE", "Fatalities", "Injuries")) %>%
filter(FATALITY_SEX != "") %>%
gather(key = "casualty", value = "count", Fatalities:Injuries) %>%
group_by(FATALITY_AGE,FATALITY_SEX, casualty) %>%
summarise(counts=sum(count)) %>%
ungroup() %>%
ungroup()
p3=ggplot(aes(x = casualty, y=total_counts), data = casualties_total) + geom_col(aes(fill = casualty)) +
scale_fill_manual(values=getPalette(2), guide = guide_legend(title = "Casualty Type"))+
xlab("Casualty Type")+
ylab("Number of Total Casualties")+
scale_x_discrete(labels=c("Fatalities","Injuries")) +
scale_y_sqrt(lim=c(0,600000),breaks=c(1000, 10000, 100000, 500000)) +
theme_minimal() +
theme(axis.text = element_text(size = 10), axis.title =element_text(size=12), legend.text=element_text(size=10), legend.title=element_text(size=12))+
guides(fill=F)
# +1 transformation of x-values to cope with log10 transformation of x-axis (this will include x-values = 0)
p4=ggplot(aes(x = count+1), data = casualties_by_category) +
geom_histogram(aes(fill=casualty),binwidth=0.04) +
# breaks are each shifted +1 because of +1 transformation of x-values, however, labels are -1 rescaled to represent the correct values on the x-axis
scale_x_log10(lim=c(0.7,10000),breaks=c(1,2,11,71,162,1001,10001), labels=c(0,1,10,70,161,1000,10000)) +
xlab("Number of Casualties") +
ylab("Counts") +
scale_y_log10(breaks=c(10,100, 1000, 10000, 100000, 1000000)) +
scale_fill_manual(values=getPalette(2), guide = guide_legend(title = "Casualty Type")) +
scale_color_manual(name = "Events with casualties", values=c("95th percentile"="black", "5th percentile"="black")) +
scale_linetype_manual(name = "Events with casualties", values = c("95th percentile"=1, "5th percentile"=2)) +
theme_minimal() +
theme(strip.text = element_text(size=12), axis.text = element_text(size = 10), axis.title =element_text(size=12), legend.text=element_text(size=10), legend.title=element_text(size=12))+
facet_wrap(~casualty)
# add lines depicting 5th and 95th percentile for events with recorded casualties (excludes events with no casualty records)
p4 = p4 + geom_vline(data =subset(casualties_by_category, casualty=="Fatalities" & count !=0) ,aes(color = "95th percentile", xintercept = quantile(count,probs=0.95), linetype = "95th percentile" )) +
geom_vline(data = subset(casualties_by_category, casualty=="Fatalities" & count !=0) ,aes(color = "5th percentile", xintercept = quantile(count,probs=0.5), linetype = "5th percentile" ))
p4 = p4 + geom_vline(data =subset(casualties_by_category, casualty=="Injuries" & count !=0) ,aes(color = "95th percentile", xintercept = quantile(count,probs=0.95), linetype = "95th percentile" ) ) +
geom_vline(data = subset(casualties_by_category, casualty=="Injuries" & count !=0) ,aes(color = "5th percentile", xintercept = quantile(count,probs=0.5), linetype = "5th percentile" ) )
grid.arrange(p4,p3,widths=c(22,6), ncol=2, top=textGrob("Distribution of Casualty Counts", gp=gpar(fontsize=14, face = "bold")))
The distribution of casualties is highly right skewed for both fatalities and injuries. For more than 1000000 of the total 1442330 extreme weather events no casualties were recorded. For weather events with recorded casualties, the number of fatalities ranges from 1-161 whereas the number of injuries ranges from 1-70 (range represents 90% of data with recorded casualties). Moreover, both categories are associated with a smaller number of events that have significantly higher casualty numbers compared to the majority of events in the cognate group. E.g., events with approximately 800 casualties and almost 5000 casualties in the Fatalities and Injuries group, respectively. With predominantly 2-100 incidents per event, the results on the distribution of casualty numbers are somewhat deceiving in terms of perception of total casualty numbers. However, if we depict the sum of all casualties, the full scope of the consequences associated with extreme weather events becomes much clearer. Over 500000 injuries and fatalities each were recorded for the period 1950-2017!!
Now let’s include demographic statistics and analyse casualties by age and gender. For different age groups, I would expect to observe casualty numbers that follow the general population distribution for the United States.
ggplot(aes(x=FATALITY_AGE, y =counts, group=FATALITY_SEX), data = casualties_by_age_gender) +
ggtitle("Casualties by Age and Gender") +
geom_line(aes(color=FATALITY_SEX),stat = "summary", fun.y = sum) +
scale_y_sqrt() +
scale_x_continuous(lim=c(0,100), breaks=seq(0,100,5))+
xlab("Age (Years)")+
ylab("Number of Casualties")+
geom_smooth(aes(color=FATALITY_SEX)) +
scale_color_manual(values=(getPalette(2)), guide = guide_legend(title = "Gender"), labels=c("Female", "Male"))+
facet_wrap(~casualty) +
theme_minimal()+
theme(strip.text = element_text(size=12), plot.title= element_text(hjust=0.5, size = 14, face = "bold"), axis.text = element_text(size = 10),
axis.title =element_text(size=12),legend.text=element_text(size=10), legend.title=element_text(size=12))
# for each casualty type (injuries vs. fatalities), calculates the Pearson Correlation Coefficient for age up to 70 and 70 or older, respectively
cor_fat_0_70 = tidy(with(subset(casualties_by_age_gender, casualties_by_age_gender$FATALITY_AGE <= 70 & casualties_by_age_gender$casualty == "Fatalities"), cor.test(FATALITY_AGE,counts)))
cor_inj_0_70 = tidy(with(subset(casualties_by_age_gender, casualties_by_age_gender$FATALITY_AGE <= 70 & casualties_by_age_gender$casualty == "Injuries"), cor.test(FATALITY_AGE,counts)))
cor_fat_70_ = tidy(with(subset(casualties_by_age_gender, casualties_by_age_gender$FATALITY_AGE > 70 & casualties_by_age_gender$casualty == "Fatalities"), cor.test(FATALITY_AGE,counts)))
cor_inj_70_ = tidy(with(subset(casualties_by_age_gender, casualties_by_age_gender$FATALITY_AGE > 70 & casualties_by_age_gender$casualty == "Injuries"), cor.test(FATALITY_AGE,counts)))
df_cor = data.frame() %>% bind_rows(cor_fat_0_70,cor_inj_0_70,cor_fat_70_,cor_inj_70_)
rownames(df_cor) = c("Fatalities age 0-70", "Injuries age 0-70", "Fatalities age 70+", "Injuries age 70+")
df_cor
## estimate statistic p.value
## Fatalities age 0-70 0.7323805 12.726831 0.0000000000000000000000003908692
## Injuries age 0-70 0.5085346 6.988120 0.0000000001039410108505342153734
## Fatalities age 70+ -0.6184963 -6.096878 0.0000000845980723155704863235678
## Injuries age 70+ -0.6286802 -6.262005 0.0000000446682687877821317499811
## parameter conf.low conf.high
## Fatalities age 0-70 140 0.6455322 0.8005295
## Injuries age 0-70 140 0.3752411 0.6212224
## Fatalities age 70+ 60 -0.7520823 -0.4360966
## Injuries age 70+ 60 -0.7592307 -0.4494934
## method alternative
## Fatalities age 0-70 Pearson's product-moment correlation two.sided
## Injuries age 0-70 Pearson's product-moment correlation two.sided
## Fatalities age 70+ Pearson's product-moment correlation two.sided
## Injuries age 70+ Pearson's product-moment correlation two.sided
As evident from figure Casualties by Age and Gender and the results from Pearson’s correlation test, the casualty numbers are positively correlated with age up to the age of approximately 70. From the age 0f 70 onward the correlation between casualty numbers and age is negative. This is true for both categories (Fatalities and Injuries), however, the positive correlation is stronger for Fatalities. Those trends could merely reflect age demographics for the United States or alternatively imply differences in risk for different age groups. So let’s include data on age distribution in the United States to clarify on the aforementioned statement.
# group casualties by age groups
casualties_by_age_gender["Age_groups"] = as.character(cut(casualties_by_age_gender$FATALITY_AGE, breaks=c(0,19, 39, 59,79, 89, max(casualties_by_age_gender$FATALITY_AGE))))
# read the census data (combined male and female data from 1990-2017) and calculate average population percentage for each age group
census_data = read.csv("census_usa.csv", check.names = F)
census_data_mean = apply(census_data[,-1],2,function(x)mean(x))
# convert vector namess from census_data_mean to same names used in "Age_groups" variable of casualties_by_age_gender dataset; This way, we can use the hashed form of census data to incorporate
# the population percentage for each corresponding Age group in casualties_by_age_gender dataset
names(census_data_mean)[order(names(census_data_mean))] = pull(unique(na.omit( casualties_by_age_gender["Age_groups"])))
# make a python dictionary-like object with age groups as keys and the corresponding average population percentage as values
census_data_mean_dict = hash()
for(age_group in names(census_data_mean)){
census_data_mean_dict[age_group] = census_data_mean[age_group]
}
# function used to retrieve population percentage for each corresponding Age group
census_classification = function(group) {
if(is.na(group)){
output = NA
}
else {
output = census_data_mean_dict[[group]]
}
output
}
# include the corresponding population percentage from census data for the cognate age group in the data set
casualties_by_age_gender["census"] = apply(subset(casualties_by_age_gender, select = c("Age_groups")),1, function(x) census_classification(x))
casualties_by_age_groups = subset(casualties_by_age_gender,!is.na(casualties_by_age_gender$Age_groups)) %>%
group_by(Age_groups,census) %>%
summarise(group_counts = sum(counts)) %>%
transform(expected_counts = (census/100)*sum(group_counts) ) %>%
ungroup() %>%
select(c("Age_groups", "group_counts", "expected_counts"))
# reshape data into long format to plot observed counts and expected counts
casualties_by_age_groups_reshape = melt(casualties_by_age_groups)
# calculate the relative risk for each age group
# 1. take the bigger of expected values and observed values as numerator and calculate the ratio
# 2. convert ratio to negative value if the expected value is bigger than the observed values
# 3. decreased risk will have ratio value < 1 and increased risk will have ratio value > 1
casualties_by_age_groups["ratio"] = apply(casualties_by_age_groups[,2:3],1,function(x){max(x)/min(x)})
casualties_by_age_groups = casualties_by_age_groups %>%
transform(ratio = ifelse(expected_counts > group_counts,-ratio,ratio ) )
p5=ggplot(aes(x=Age_groups, y=value, group=variable), data = casualties_by_age_groups_reshape) + geom_point(aes(shape = variable, fill = variable),size=6) +
geom_line(aes(linetype=variable, size=variable)) +
xlab("Age Groups") +
ylab("Number of Casualties (Fatalities and Injuries)") +
scale_x_discrete(labels=c("0-19","20-39","40-59","60-79","80-89","90+")) +
scale_linetype_manual(values=c(1,2), guide = guide_legend(title = ""), labels=c("Observed Counts","Expected Counts")) +
scale_size_manual(values=c(1, 1.5), guide=F)+
scale_shape_manual(values=c(21,23), guide=F) +
scale_fill_manual(values=(getPalette(2)), guide=F) +
theme_minimal() +
theme(axis.text = element_text(size = 10),
axis.title =element_text(size=12), legend.text=element_text(size=10))
p6=ggplot(aes(x=Age_groups, y=ratio), data = casualties_by_age_groups) + geom_col(aes(fill = ifelse(ratio < 0,"Decreased Risk","Increased Risk") )) +
scale_y_continuous(lim=c(-12,12), breaks = seq(-12,12,4), labels=c("-12","-8","-4","1","4","8","12")) +
scale_x_discrete(labels=c("0-19","20-39","40-59","60-79","80-89","90+")) +
xlab("Age Groups") +
ylab("Relative Risk") +
scale_fill_manual(values=c("darkgreen","red"), guide = guide_legend(title = ""))+
theme_minimal() +
theme(axis.text = element_text(size = 10), axis.title =element_text(size=12), legend.text=element_text(size=10), panel.grid.minor = element_blank())
grid.arrange(p5,p6,ncol=2, widths=c(6,5), top=textGrob("Relative Risk for Age-Groups", gp=gpar(fontsize=14, face = "bold")))
If we take the average age distribution in the United States based on census data from 1990 to 2017 (http://www.populationpyramid.net/united-states-of-america/; 0-19 27.6%, 20-39 28.9%, 40-59 19%, 60-79 14.2%, 80-89 2.8%, 90+ 0.57%) as the reference and assume independence between age and risk of accidents, we would expect on average highest and lowest casualty numbers for age groups 0-19, 20-39, 40-59 and 80-89, 90+, respectively. Intermediate Casualty numbers should be expected for age group 60-79. However, this is not consistent with actual data, which shows much lower than expected casualty numbers for age groups 0-19 and 20-39 and much higher than expected casualty numbers for age groups 60-79, 80-89 and 90+ (Figure Relative Risk for Age-Groups left plot). Plotting the corresponding relative risk ratio (the larger value divided by the smaller value for the corresponding pair of observed counts and expected counts; negative sign indicates higher values for expected counts as opposed to observed counts) reveals that the former groups have decreased casualty risk, whereas the latter age groups have increased casualty risk. Only the age group 40-59 has casualty numbers that are close to the expected counts. Why?? (direct vs. Indirect casualties, older people more prone to die from injuries; here, one could analyse casualties separately for injuries and fatalities and furthermore analyse direct vs. indirect casualties)
From the first glimpse at the storm events data set I noticed that values for DAMAGE_PROPERTY and DAMAGE_CROPS are encoded as digit-letter abbreviations, such as 10K, which equals 10000. For the purpose of summing the amounts over a month or a year, it is necessary to transform each value into the corresponding number. First, let’s check the unique set of characters used for abbreviation (e.g. “K” for ten thousand), so we know what numerical factors are required for conversion into numbers.
head(df_events_victims$DAMAGE_PROPERTY)
## [1] "250K" "25K" "25K" "2.5K" "2.5K" "2.5K"
# use regex to extract the character in abreviation and return unique
# if required, transform factor type to character type for regex check
regexpr = regexec("[aA-zZ]", c(df_events_victims$DAMAGE_CROPS, df_events_victims$DAMAGE_PROPERTY))
unlist(unique(regmatches(c(df_events_victims$DAMAGE_CROPS, df_events_victims$DAMAGE_PROPERTY), regexpr)))
## [1] "K" "M" "k" "T" "B" "h" "H"
The unique set comprises of the letters K, M, T, H, B and a no match (character(0); omit unlist() to return no match). Since I used word character a-z for the regular expression, the no match means that the corresponding value is either missing or its representation excludes any character. Missing values and values without letter will be converted to NA, because we can’t deduce the corresponding real number value. From the documentation, we know that “K” and “M” are abbreviations for “thousand” and “million”, respectively. “B” is the canonical abbreviation for “billion”. However, the meaning for “H” and “T” is somewhat unclear. I assume that “T” refers to “trillion” (canonical letter abbreviation in finance), whereas “H” is equivalent for “hundred” (from hecto). But nevertheless, let’s extract the corresponding values containing T or h/H and use the number part of the string to help decide whether T and h/H really correspond to trillion and hundred, respectively. I expect the corresponding number to be in the range of tenth-thousandth (0.1-0.001) for Trillion, because otherwise the amount would be simply too big to be true. On the other hand, I expect the corresponding number for h/H values to be in the range of thousand and bigger (1000-), since I doubt that costs of 1000 dollars or lower, would have been recorded. Of course, my assumption might as well be wrong.
# adapt apply by changing select argument to "DAMAGE_CROPS" and regexec/regmatches pattern argument to "T" or "h"; returns boolean vector that can be used to return the numeric part of damage entries with appropriate character (here, "H") in abreviation
check = apply(subset(df_events_victims, select = "DAMAGE_PROPERTY"),1,function(x) { ifelse(length(unlist(regmatches("H", regexec("H", x)))) ==1,TRUE,FALSE) } )
unique(df_events_victims$DAMAGE_PROPERTY[check] )
## [1] "2H" "5H"
Since there is only one hit for “T” with 0 as the corresponding numeric value (“0T”), we don’t need to decide about the true meaning of “T”. However, the hits for “h”/“H” are of magnitude ten (“2h”,“5h”,“5H)”. As such, I revise my assumption for “h” and “H” to rather represent “hundred thousand” instead of “thousand”.
value_transformation = function(value){
# there seems to be a bug with the hash() function. Using the standard definition of the hash object hash(key = value) with h as key returns an error (The empty character string, '', cannot be used for a key at key(s): 1). Somehow this is only true for h as key. This problem can be circumvented by using the .set() notation
letter_value = hash()
.set(letter_value, keys=c("K","k","M","B","T","h","H"), values=c(1000,1000,1000000,1000000000,1000000000000,100000,100000))
# use regex to extract the letter from digit-letter abreviation
regexpr_letter = regexec("[aA-zZ]", value)
match_letter = unlist(regmatches(value, regexpr_letter))
# use regex to extract the digit from digit-letter abreviation
regexpr_digit = regexec("\\.?[0-9]+", value)
match_digit = as.numeric(unlist(regmatches(value, regexpr_digit)))
if(value == "0"){
output = 0
}
# check for digit and letter match in corresponding damage entry; both required for valid transformation
else if(length(match_letter) == 1 & length(match_digit) == 1){
# convert to numeric value
output = letter_value[[match_letter]]*match_digit
}else{
output = NA
}
output
}
# transformation of value_transformation() with Vectorize() is required when using dplyr's mutate_at, which takes vectors as input
value_transformation = Vectorize(value_transformation, USE.NAMES = F)
df_events_victims = df_events_victims %>% mutate_at(vars(DAMAGE_PROPERTY,DAMAGE_CROPS),funs(value_transformation))
# convert dataset including only the damage variables to long format (used for plotting the distribution of damage amount)
damage_reshape = gather(subset(df_events_victims, select = c("DAMAGE_PROPERTY", "DAMAGE_CROPS")), key = "damage_category", value = "amount", c("DAMAGE_PROPERTY", "DAMAGE_CROPS"))
# calculates the total damage amount
total_damage_by_category = subset(damage_reshape, !is.na(damage_reshape$amount)) %>% group_by(damage_category) %>% summarise(total = sum(amount))
# +1 transformation of x-values to cope with log10 transformation of x-axis (this will include x-values = 0 )
p7 = ggplot(aes(x=amount+1), data = subset(damage_reshape, !is.na(amount))) + geom_histogram(aes(fill = damage_category),binwidth = 0.06) +
xlab("Total Amount ($)") +
ylab("Counts") +
theme_minimal() +
# breaks are each shifted +1 because of +1 transformation of x-values, however, labels are -1 rescaled to represent the correct values on the x-axis
scale_x_log10(breaks = c(1, 101, 1001, 10001, 100001, 1000001, 3622641, 10000001, 100000001), labels=c(0, 100, 1000, 10000, 100000, 1000000, 3622640, 10000000, 100000000)) +
scale_y_log10() +
scale_fill_manual(name = "Damage Category", values=getPalette(2))+
scale_color_manual(name = "statistics", values=c("mean"="black", "median"="black")) +
scale_linetype_manual(name = "statistics", values = c("mean"=1, "median"=2)) +
theme(strip.text = element_text(size=12), axis.text.x = element_text(angle = 45, hjust = 1,size = 10), axis.text.y = element_text(size = 10),
axis.title =element_text(size=12), legend.text=element_text(size=10), legend.title=element_text(size=12))+
facet_wrap(~damage_category)
# add lines representing mean and median damage amounts (+1 transformation of median, because median = 0; otherwise the line would not intersect 0 in the plot)
p7 = p7 + geom_vline(data =damage_reshape[which(damage_reshape$damage_category == "DAMAGE_PROPERTY"),] ,aes(color = "mean", xintercept = mean(damage_reshape$amount[!is.na(damage_reshape$amount) & damage_reshape$damage_category == "DAMAGE_PROPERTY"]), linetype = "mean" ) ) +
geom_vline(data =damage_reshape[which(damage_reshape$damage_category == "DAMAGE_PROPERTY"),] ,aes(color = "median", xintercept = median(damage_reshape$amount[!is.na(damage_reshape$amount) & damage_reshape$damage_category == "DAMAGE_PROPERTY"]+1), linetype = "median" ) )
p7 = p7 + geom_vline(data =damage_reshape[which(damage_reshape$damage_category == "DAMAGE_CROPS"),] ,aes(color = "mean", xintercept = mean(damage_reshape$amount[!is.na(damage_reshape$amount) & damage_reshape$damage_category == "DAMAGE_CROPS"]), linetype = "mean" ) ) +
geom_vline(data =damage_reshape[which(damage_reshape$damage_category == "DAMAGE_CROPS"),] ,aes(color = "median", xintercept = median(damage_reshape$amount[!is.na(damage_reshape$amount) & damage_reshape$damage_category == "DAMAGE_PROPERTY"]+1), linetype = "median" ) )
p8 = ggplot(aes(x = damage_category, y=total/1000000000), data = total_damage_by_category) + geom_col(aes(fill = damage_category)) +
ylab("Amount (Billion $)") +
xlab("Damage Category")+
theme_minimal() +
theme(axis.text = element_text(size = 10), axis.title =element_text(size=12)) +
scale_x_discrete(labels=c("CROPS","PROPERTY")) +
scale_fill_manual(values=getPalette(2))+
scale_y_sqrt(breaks = c(69, 1000, 2000, 3000, 3445)) +
guides(fill=F)
grid.arrange(p7,p8, ncol=2, widths = c(22,6), top=textGrob("Distribution of Damage Amount incured by Extreme Weather", gp=gpar(fontsize=14, face = "bold")))
The distribution of costs incurred by extreme weather is highly right skewed for both damage categories (crops vs. property). For the vast majority of events no costs were incurred, which is also reflected by a median of 0 for both damage categories. Overall, there are more extreme weather events recorded that incurred higher costs on property as opposed to crops. This is reflected by a larger mean for costs on property damage. Interestingly, irrespective of damage category incurred costs seem to form 4-5 different clusters, which are: no costs, small costs (100-1000 USD), medium-higher costs (10000-1000000 USD) and high-extreme costs (double-digit millions-billions). The plot on the right side in figure 3 shows the total costs incurred over the period 1950-2017. Overall, with over 3000 billion USD the total costs for property damage are much higher than for damage on crops.
Let’s have a look at the accumulated damage amount, the total number of casualties and the frequency of extreme weather events for each state over the period from 1950-2017. The distribution of casualty numbers is highly skewed with most values in the low range and a few very large values. For better visualization results, the casualty numbers will be binned into appropriate groups, representing different magnitudes of casualties (very low, low, medium, etc.). Alternatively, log10 transformation would also be suitable.
# transform states/county data to data frame with fortify(); fortify takes a model, which you want to convert, and region defines the variable by which to split the data
# fips codes essential for county-specific mapping, because county names aren't unique
states = fortify(usa_composite(),region="fips_state")
cmap = fortify(counties_composite(), region="fips")
# combine property (DAMAGE_PROPERTY) and crops damage (DAMAGE_CROPS) to calculate the total damage for each event
# this conversion will be required in other reshaped variants of the main data frame df_events_victims.
df_total_damage = df_events_victims %>%
# converts NA to 0 for one damage category if other category is not NA; This avoids NA result when summing number and NA
mutate_at(vars(DAMAGE_PROPERTY), funs(ifelse(!is.na(DAMAGE_CROPS) & (is.na(DAMAGE_PROPERTY) | DAMAGE_PROPERTY==0),0, DAMAGE_PROPERTY ))) %>%
mutate_at(vars(DAMAGE_CROPS), funs(ifelse(!is.na(DAMAGE_PROPERTY) & (is.na(DAMAGE_CROPS) | DAMAGE_CROPS==0),0, DAMAGE_CROPS ))) %>%
transform(damage_combined = DAMAGE_PROPERTY+DAMAGE_CROPS) %>%
# remove NA to avoid NA result when summing number and NA
filter(!is.na(damage_combined))
# returns total damage amount for each state
total_damage_by_state = subset(df_total_damage, select = c("STATE","STATE_FIPS","damage_combined")) %>%
# state fips have two-digit format; formatC() preserves 2-digit format and adds a 0 if entry is of 1-digit type
transform(FIPS = formatC(STATE_FIPS, digits=2,width=2,format = "d", flag=0) ) %>%
filter(FIPS %in% states$id) %>%
group_by(STATE,FIPS) %>%
summarise(total_damage = sum(damage_combined)) %>%
ungroup()
# returns total casualty number for each state
total_casualty_by_state = subset(casualties_reshape, select=c("STATE", "FIPS", "Fatalities", "Injuries")) %>%
transform(casualties_combined = Fatalities+Injuries) %>%
filter(!is.na(FIPS)) %>%
group_by(STATE,FIPS) %>%
filter(FIPS %in% states$id) %>%
summarise(total_casualties = sum(casualties_combined)) %>%
ungroup()
# define discrete groups for casualty numbers
total_casualty_by_state["casualty_bins"] = cut(total_casualty_by_state$total_casualties, breaks=c(0,100,1000,10000,50000,100000,500000), dig.lab = 6)
# return geographic coordinates for each extreme weather event; required to calculate the probability density function for events in US and to represent probabilities of extreme weather for different geographic locations (via stat_density2d() function)
event_coordinates = subset(df_events_victims, select=c("BEGIN_LAT", "BEGIN_LON" )) %>%
# filter for the 48 contiguous states (- Alaska and Hawaii) to avoid clipping artifacts of polygons in stat_density2d() function
filter(BEGIN_LAT>=24.396308 & BEGIN_LAT<=49.384358) %>%
filter(BEGIN_LON > -124.848974, BEGIN_LON < -66.885444)
p9 = ggplot()+
geom_map(data=states,map=states,aes(x=long,y=lat, map_id =id), color ="black", size=0.4,fill=NA) +
geom_map(data = total_casualty_by_state, map=states, aes(fill= casualty_bins, map_id=FIPS), na.value="white") +
ggtitle("Total Casualties (1950-2017)") +
# the log10 for the total damage min value is 8.094
scale_fill_viridis(name = "Number of Casualties", option = "A", discrete = TRUE, labels = c("< 100", "100-1000", "1000-10000", "10000-50000","50000-100000", "> 100000"),
guide =guide_legend(title.position = 'top', title.hjust = 0.5)) +
theme_minimal(base_size = 12) +
theme(axis.line = element_blank(), axis.text = element_blank(),
axis.ticks = element_blank(), axis.title = element_blank(), plot.title= element_text(hjust=0.5, size = 14, face = "bold"),
panel.grid.major = element_blank(), legend.position = "bottom", legend.text=element_text(size=12), legend.title=element_text(size=14) ) +
coord_map()
p10 = ggplot()+
geom_map(data=states,map=states,aes(x=long,y=lat, map_id =id), color ="black", size=0.4,fill=NA) +
geom_map(data = total_damage_by_state, map=states, aes(fill=log10(total_damage), map_id=FIPS), na.value="white") +
ggtitle("Total Damage (1950-2017)") +
# the log10 for the total damage min value is 8.094
scale_fill_viridis(name = "Amount in $, log scale", option = "A",
limits=c( log10( 10**8 ), log10(max(total_damage_by_state$total_damage))),
breaks=c(log10(10**8),log10(10**9),log10(10**10),log10(10**11),log10(10**12)),
labels=c("$100 Million","$1 Billion", "$10 Billion","$100 Billion", "$1000 Billion"),
guide = guide_colorbar(direction = "horizontal",barheight = unit(2, units = "mm"),barwidth = unit(50, units = "mm"), title.position = 'top', title.hjust = 0.5, label.hjust = 1.0, label.vjust = 1.0 )) +
theme_minimal(base_size = 12) +
theme(axis.line = element_blank(), axis.text = element_blank(),
axis.ticks = element_blank(), axis.title = element_blank(), plot.title= element_text(hjust=0.5, size = 14, face = "bold"),
panel.grid.major = element_blank(), legend.position = "bottom", legend.text=element_text(angle = 45, size=12), legend.title=element_text(size=14)) +
coord_map()
p11 = ggplot()+
geom_map(data=states,map=states,aes(x=long,y=lat, map_id =id), color ="black", size=0.4,fill=NA) +
stat_density2d(aes(y=BEGIN_LAT, x=BEGIN_LON, fill = ..level.., alpha = ..level.. ), size = 0.01, bins = 20, data = event_coordinates, geom = 'polygon')+
scale_alpha(guide = FALSE)+
ggtitle("Extreme Weather Probability Estimate") +
scale_fill_viridis(name = "Density Level",option = "A",
guide = guide_colorbar(direction = "horizontal", barheight = unit(2, units = "mm"),barwidth = unit(50, units = "mm"), title.position = 'top', title.hjust = 0.5, label.hjust = 1.0, label.vjust = 1.0))+
theme_minimal(base_size = 12) +
theme(axis.line = element_blank(), axis.text = element_blank(),
axis.ticks = element_blank(), axis.title = element_blank(), plot.title= element_text(hjust=0.5, size = 14, face = "bold"),
panel.grid.major = element_blank(), legend.position = "bottom", legend.text=element_text(angle = 45,size=12), legend.title=element_text(size=14)) +
coord_map()
plot_grid(p9,p10,p11, align = "hv", ncol = 3)
The density plot (figure on the right) depicts the density of events as a function of geographical coordinates, which basically represents the probability estimate for occurrences of combined weather events (based on data from 1950-2017) for a particular area. It shows highest probabilities for the North East, Mid Atlantic, South East, Mid West, Mid South and large parts of the Central region. If we zoom in, we can identify true hotspots (patches with density level >= 0.004), e.g., the most parts of Oklahoma and Kansas, parts of Texas or the region encompassing New Jersey, Maryland, Washington, DC and Delaware. So those areas seem to have the highest probability or risk of being confronted with extreme weather. As expected, event density correlates well with total damage amount and total casualties. Those states or regions with highest probability tend to have higher monetary damage and higher casualty numbers caused by extreme weather. However, we also observe some irregularities with respect to this correlation. E.g., Louisiana, California and Florida exhibit very high damage costs but proportionally lower event densities. Similarly, casualty numbers and event density seem to be disproportional for Nevada. This rather implicates that those states experienced “fewer” but much more severe events as opposed to other states, albeit high damage in states with high event frequency might also be attributable to few severe events. An overall higher extreme weather probability does not necessary implicate higher probability for most severe events!
We know that the geographical distribution of extreme weather seems to be highly biased towards the eastern part of the United States if we take the accumulated events over the past 70 years. Equally interesting, however, is the change in event frequency over time. Do we observe any obvious increase or decrease in frequency for different event categories?
# load US cities data and filter for cities with 500.000 population or greater
data(us.cities)
us_cities = us.cities
us_cities = us_cities %>%
filter(pop >= 500000)
# function to create GIFs that show the number of events over time (for 5 year periods)
myplot_GIF = function(period, event_filter="All Combined", guide=T) {
# period (string): decades_buckets values (specifies the period for which the number of events should be plotted)
# event_filter (string): specifies which storm event category to use from the df_events_victims dataset (e.g. Tornado); no specification (all categories combined) is the default
# guide (bool): specifies whether legend should be included in the plot
storm_dataset = subset(df_events_victims, select=c("STATE","STATE_FIPS", "CZ_FIPS", "decades_buckets", "EVENT_CATEGORY")) %>%
mutate_at(vars(STATE), funs(tolower)) %>%
transform(FIPS = paste(formatC(STATE_FIPS, digits=2,width=2,format = "d", flag=0), formatC(CZ_FIPS, digits=3,width=3,format = "d", flag=0), sep="") ) %>%
transform(FIPS=as.character(FIPS)) %>%
select(c("STATE", "FIPS", "decades_buckets", "EVENT_CATEGORY"))
# calculate the 90th percentile of event count distribution (either for a specified category or for combined events) to define the upper limit for number of events in the plots
if(event_filter != "All Combined"){
upper_limit = storm_dataset %>%
group_by(FIPS,decades_buckets, EVENT_CATEGORY) %>%
filter(EVENT_CATEGORY == event_filter ) %>%
summarise(counts = n()) %>%
ungroup()
}else{
upper_limit = storm_dataset %>%
group_by(FIPS,decades_buckets) %>%
summarise(counts = n()) %>%
ungroup()
}
upper_limit = quantile(upper_limit$counts,probs = 0.9)
# determine counts for combined events, or group by specified category
if(event_filter != "All Combined") {
storm_dataset = storm_dataset %>% filter(EVENT_CATEGORY == event_filter) %>%
group_by(FIPS,decades_buckets, EVENT_CATEGORY)
}else{
storm_dataset = storm_dataset %>%
group_by(FIPS,decades_buckets)
}
storm_dataset = storm_dataset %>%
filter(decades_buckets == period) %>%
summarise(events = n())
storm_dataset = storm_dataset %>%
transform(events = ifelse(events > upper_limit, upper_limit, events)) %>%
ungroup() %>%
ungroup()
# plot reshaped data
ggplot()+
# use regex to transfor period from "(year,year]" format (standard cut() output) into "year-year" format (more appealing)
ggtitle(gsub('^(\\()([0-9]{4}),([0-9]{4})(\\])$', '\\2-\\3', period)) +
geom_map(data=cmap, map=cmap, aes(x=long, y=lat, map_id =id), fill="white", color="black", size = 0.05) +
geom_map(data = storm_dataset, map=cmap, aes(fill=events, map_id=FIPS), na.value="white") +
geom_map(data=states,map=states,aes(x=long,y=lat, map_id =id), color ="black", size=0.8,fill=NA) +
geom_point(aes(x=long, y=lat,color = ""), data=us_cities, size=2) +
coord_map() +
theme_minimal(base_size = 12) +
theme(axis.line = element_blank(), axis.text = element_blank(),
axis.ticks = element_blank(), axis.title = element_blank(), plot.title= element_text(hjust=0.5, size = 16, face = "bold"),
panel.grid.major = element_blank(), legend.position = "bottom", legend.text=element_text(size=14), legend.title=element_text(size=14)) +
scale_color_manual(values = "red" , guide = guide_legend(title = "Over 0.5 Mio. Population") ) +
scale_fill_viridis(name = "Number of Events", option="D",na.value="white", limits=c(1,upper_limit),
guide = guide_colorbar(direction = "horizontal",barheight = unit(2, units = "mm"),barwidth = unit(50, units = "mm"), title.position = 'top', title.hjust = 0.5 )) +
if(guide == F){
guides(fill=F, color = F)
}
}
# convert legend into plotting objects in order to individualize multiplots
get_legend = function(plot){
# build all plot grobs
grob_table = ggplot_gtable(ggplot_build(plot))
# identify list in grob_table that stores information about the legend
legend_list = which(sapply(grob_table$grobs, function(x) x$name) == "guide-box")
# extract legend object (contains all information about the plot)
legend = grob_table$grobs[[legend_list]]
return(legend)
}
make_GIF = function(event){
# get legend plot
legend = get_legend(myplot_GIF("(2010,2015]", event))
# adjust legend x and y axis orientation on plotting canvas
legend$vp$x = unit(1.5, 'npc')
legend$vp$y = unit(1.5, 'npc')
oopt = ani.options(interval = 1)
saveGIF(
{for (period in unique(df_events_victims$decades_buckets[!is.na(df_events_victims$decades_buckets)]) ) {
time_series = myplot_GIF(period, event,F)
first_record = myplot_GIF("(1949,1955]",event,F)
last_record = myplot_GIF("(2010,2015]",event,F)
print(grid.arrange(first_record,last_record,time_series,legend, heights=c(10,1),widths =c(1,1,1),ncol=3, top=textGrob(paste("Number of Extreme Weather Reports for the Category",event), gp=gpar(fontsize=24))))
ani.pause()
}},movie.name = paste("Extreme_Weather_timeseries_", event, ".gif", sep=""),ani.width = 1500, ani.height = 500)
}
# make GIF files for different Weather categories
for(event in c("Tornado", "Hail", "Storm", "All Combined")){
make_GIF(event)
}
knitr::include_graphics('Extreme_Weather_timeseries_Tornado.gif')
knitr::include_graphics('Extreme_Weather_timeseries_Storm.gif')
knitr::include_graphics('Extreme_Weather_timeseries_Hail.gif')
knitr::include_graphics('Extreme_Weather_timeseries_All combined.gif')